DOCOHENT BESUBE 



ED 097 376 \^ TM 004 018 

ADTHOP Jaegei, Bichard M. 

TTTL'=! A Primer on Sampling for Statewide Assessment. 

INSTITUTION Educational Testing Service, Princeton, N.J. Center 

tor statewide i^ducational Assessment. 
POB DATE .73 
NOTE '60p.' 

AVAILABLE FPOM Center for Educational Assessnent, Educational 

Testing Service, Princeton, N.J. 08540 (Free) 

EDBS PRICE HF-$0.75 HC-$3.15 PLUS POSTAGE 

DESCPIPTORS Definitions; *Educational Assessment; *Guides; 

Olbjectives; *Sarapling; *State Surveys 

ABSTRACT 

This paper is a primer on sampling procedures for ' 
statewide assessment. The careful reader should gain substantial 
knowledge about the promises and pitfalls of sampling for assessment. 
The primer has three basic objectives: (1) to define terms and 
concepts basic to/sampling theory and its application, including 
population, sampling unit, sampling frame, probability sampling 
procedures, estimate, population parameter an,d estimator, estimator 
bias, variance, mean square error and efficiency, and consistency; 
<2) to illustraie some of the ways sampling procedures can be used to 
achieve realistic assessment objectives; and, (3) to describe issues 
that arise when sampling procedures are used, and the factors that 
contribute to th^lr resolation^ Objectives two and three include 
discussions of simple random sampling, /stratified random sampling, 
systematic sampling, cluster sampling, and matrix sampling. The 
appendix gives an \example of an evaluation of alternative cluster 
samplin.? procedures. (SE) 



cr- 
O 

k J 



BEST COPY AVAiLABlt 



I P 

A Primer 



on Sampling for 
Statewide Assessment 



Richard M. Jaeger 



00 



US D£PA«TMENTOPHEALTH 
E DUCATlON & WELFABE 
NATIONAL INSTITUTE 0»= 
fc DUCATlON ' 

, ' , , • . .-. L. - t .f c; • ocv 
. ; ••• • ' *' '• ** 
* . ^. ^ , . ' .-. p ». ^N'- 



MISSION TO H^lJHODuft This r.OP> 



n '}!■' ;/'.<.'■ /Arc- ,.v:t' 

1 "IL AND OWGANi/ATiON^ OPtOAT-NC, 
tjSOJ W At,Hf t Mi NT'^ rtJTH 'ME NATIONAL -N 
'J« fCHiCATiON MiMT^^fH «t P«0 




CENTER FOR STATEWIDE EDUCATIONAL ASSESSMENT 
EDUCATIONAL TESTING SERVICE • PRINCETON, NEW JERSEY 



■ _ 








. 

•■* / 


t 

% 

/ 


- 


I ^ 

/ * 


,y 


. i! 
- !i , 

; ( 


y_ ... 


■ f 

i 






■ / 

/ " 


\ 


- 




* 


• 


-- .... • 




c 

Copyright © 1973 by Educational Testing Service 


•1 

i 

] 

i 


0 

ERJC 


Educational Testing Service is an Equal Opportunity Errplcyer 





\ 



A PRIN^ ON SWLPG FOR STATEWIDE ASSESSMENT 



Published by the Center for Statewide Educational 
Assessment which is supported by funds fron the 
Ford Foundation. 



Richard M. Jaeger 

Visiting Research Psychologist 

Educational Testing Service 



This pal )er is a brief intrcxluci ion to f inite p^toulation san^^ling methods ir 
specially r^rojvircKl tor tJio5;o\'c|)nt-:tjnK?(l witli statev^icTe assessment progranas* 
Ihe sairplirAT ^:)nx?cx:kirGs {lo^cri);Kxl in Lho pai:or are those most likely to be 
useful in achiovi^KTllit^ of ntatcwido assessinent. , ; ' — - 

Tho [wix^r is intoiitionally non-iTkithea^tical. While it t presumes know- . .. 
ledge of Lho fiiivkunMit.al cx>na?pLs (>r statistical inferehcG, it does riot 
require aiv/ prior e:4X)suro to tl>^ formilitios of sampling. All sanpling 



terms used in thevpuix^r are carefully defined. Descriptions of sanpling 

' V * 

\ 

procedures nti}:o use of those? dofinitionM, and avoid i:yin^essary technical- 



itiejp. llie. {xfper is^~trrtTTt^^ for those engaged in the 

practice of statfnvidb asS'.?s.^iik.nt; aid rrviJxs no claim to canprehe'nsiv^ies8'"~aB 

a theon^tical treatise. . ^ 

Helpful siKjqostions :\xv\ clai if ical ions of some otherwise opaque jLSSueS' 
were provided by *;anc77 Bnino, ]^ lul ( \ii;uiK:dli Ifenry Dycry^nd Robert Linn* i 
want to cxf)ros55 nv a}>}irociation for tjieir careful reviews of eeirly drafts. 
~-T'ain solely ro3fon5;i})lo fr^i .uv/ n^niLviniiin inaccuracies. 



Princeton, \ev Jerr>ey - Ricdiard M. JaoggMT 



\ 



TABLE OF CONTENTS 

■ \ 

About oils Paper............ .....1 

Sane Tenns .and Concepts, r-. . ........ i .2 

> 

PopiiLation. 2 

Saitpling Unit, vA 

t 

Sanpling Frane „ 5 

Probability Sanpling Procediares : . . . .6 

Estimate, Populaticsn Parameter and Estiitiator 

Estimtor Bias....* ; ^~ 

Variance, Mean Scpare Error and Efficiency « 12 

> Consistency ..1'^ 

' ^ . '• , ' /"^ ■ 

Using Sanpling in Statewide Assessma'itT4^T.«i...t."r.7.'v... ...20 

Siirple Random Sanpling. 22 

Stratified Random Sanpling 25 

Systematic Sanpling! t 30 

Cluster Satpling 34 

— Matrix Sanpling. 44 

Sunmary • • 48 

Beferenoes * 50 

i^jpoidix A: Evaluation of Altemati^/e Cluster Sanpling 
Procedures — ^An exanple ...51 



A Primer on Sampling for Statewide Assessment 
About this Paper 

When a statewide assessment is planned^ one of the first issues that 
arises is who should be test^? Even after a state has decided to test 
students in certain grades or at certain age- levels r the question, who 
should be tested? f remains. Should all fourth-graders be. tested , orj 
Should sane be selected for testing? 

In some states r the objectives and purposes that give rise tOi assess- 
imtt include a desire to secure test results^^ fbr each student in a grade; 
the assessment goals include individual assessmsint as well as institutional 
assessment. When individual assessment is desired, the "v^o to tfest" ; 
t^on-i 

individual measurement is not a goal of statewide assessm^tr 



^±-if^-*^^\\s qu^tionUs v&nswered by the selection of a grade or age- level for assess- 
- went. When individual measurement is not a goal of statewide assessm^tr 



it is usually economical and a^dministratively desirable to select a saltple 
of students for testing, rather than testing all students. > 

This paper is intended to be a prinier on sarrpling for statewide 
assessment. If its purpose is achieved, the careful reader will gain ' ^ 
substantial knowledge about the premises and pitfalls of sampling for 
assessment. The reader will not beccro an instant sanpling expert; no 
short paper can acoonplish that goal*. Instead, the dedicated reader will 
become a "sanpling conversationalist", able to meet a sampling expert at 
least half way, and able to knoA/ledgeably disc»jss sampling issues important 
to his state's assessment. Further, he will be able to converse in the 
l<anguagr of the expert. 



ERLC 



The goal of creating "sanpling conversationalists" will be pursued 
in three ways: 

1) By defining terns and oon(5epts basic to sampling theory and its 
applications? - _ . , . . , 

2) by illustrating soh of ^e ways sanpling procedures can be i 
used to achieve realistic assessnent c^jectives; and \ 

• 3) by describing issues that arise when sanpling procedures are ' . 
used, and the factors that oontrlbyite to their resolution. ^ 
* The balance of this paper is in two parts. The first part provides 
"definitions of sane of tlie nost iinportant terms and concepts fundamental 
tp the language of sampling. In the second, consideration is given to tSiso 
potential objectives~of a statewide assessment, and the ways various sanpliii^ 
procedures can contribute to their achievement. In part twp, tho reader is 
faced with alternatives and ehoiees, and then presented Vith facts to help 
him make (tecisions. - 
Some Terms and Concepts ■ 

Population "i 

In any sampling study, there is a definable group.br aggregation of 

elenents from which samples are selected. This .iggregation of elane^fcfc 

is called the population of the study. Technically, anv aggreqation of 
elenents that have at least one attribute in comnDn can form a population. 

In a statewide assessnent, sone examf)les of populations that might be of 

/ 

interest are all public schools in/ the state that enroll sixth-graders, 
all sixth-graders enrolled in public schools__injthe state and all public- 
school sixth-graders in tho "state wRo are children of migrant agricultural 

workers. From these examples, it is clear that populations can be composed 

/ 

/ 



of individuals or institutions. Similarly, populations can be^/cqnposed 
of people or t±iings. The first population, all public schools in the 
state that enroll sixth-graders, is defined by tvra attributes: control 
of school (public) and grade- level offeriji;igs (sixth grade) ? the-aeeaid 
population is also defined by two attributes: grade-level, and public- 
school enrollitent; the third population has three defining attributes: 
grade- level, public-school enrollment, and parental occipation. 

These exanples of popXilations have some iiipoirtant chiracteiristics 
in GomnDn. Each is oorrposed of a finite nuniber df-elements (sixth-graders_.. 
in the state, schools with sixth-gradears in the state, etc.)., and each is 
defined by>^ attriijutes that are easily recognized. ThatTs, one can easily 
decide v«*iether an eienent is or is not a nenteer of the population. 

Sane populations that are infinite in size may be encountered in a 
statewide assessment. An exanple of an infinite population is "all multiple- 
choice test iteiTB that could ever be written, that purport to maasure rgai^- 
ing comprehension". In contrast to the first examples, this population, is 
not defined by attributes that are easily recognized. If faced with a 
test «it»m"ttor£3aEitained a paragraph of prose followed by four questicais 
on the main theire of the paragraph, most of us wovild say that the item 
was a "reading conprehension" item, and therefore a menber of the population. 
Btit vrfiat ^ut an arithiretic word problem. .. "If it took six msa five days to 
dig a ditch...?". Clearly, reading conprGhensidn is a skill required to 
answer the item correctly. Yet it requires more than reading corprehension 
to ooiTpufte a correct solution. Is the item a vmber of the population?" 
The ans'^vcr is debatable. 



, All of the sanpling procedures discussed in this paper assume that 

the peculations to be sanpled are finj.te. This is a realist|LC ^ssunption 

' - ' ■ ■ i , ■ 

v^jenever students, classes, schools or school districts are saiipled. "iM- 

like finite populations, infinite populations are somavhat intangible^' 
and exist: only in the mind of the beholder. However, there is a well- 
developed theory of sampling frm infinite populations, so they present 
no insu^inountable statistical prdDlems. 

Anotlier way of defining a population is "the aggregation of elements 
that is of. central interest in a study". This is an admittedly loose 
definition that mi<^t upset some statistical purists, but it helps to 
point out the practical significance of populations. In a real^orld 
study such as a statewide assessnent, populations are not theoretically- 
defined e itities that exist for the fascination of statisiticians? they 
are the central foctis of the study. For example, in your statewide assess- 
I nent you may want to know tlie- proportion of public-school fourth-graders 
whose reading comprehension soOre is below the 25th percentile oi a nation- 
al norm distribution. Here, the population of interest is all fourth- 
graders enrolled in the public schools of your state. The population is 
real, and of practical interest. If you test every public-school fourth- 
grader in the state, you can determine tiie proportion exactly (provided 
there are no missing data, all absentees are tested at a later date, etc.) . 
Sanpling Unit 

Populations are made u)d of elements tented sanpling units . The sanpling 
units into which the population is divided must be unique, in the sense that . 
they do not overlap, and mus^, ihen aggregated, define the wtiole of the pop- 



ulatipn of interest. Sairplin^ units that nd^t be used in statewide assess- , 

* ■ ■ i . _ .. ■ , ■■■ -—\ 

nenqs include students, class-sections, homerooms, teadiers, sdiools, and 
8chci)l districts. These ex3inples...of sampling units clearly define unique • 

^ T / — ■■ — ' - - 

el^^ts (one student is different from another? schools that have the same 
^ESds^levels are generally unique units) that can be readily counted and 

aggregated. ^ 

'^e definitions given for '"population" and "sampling unit" may ^spear 
to be circular. But perhaps that's as it should since sajtpling units r . 

Aggregated, make up a population, and a population is an aggregatioi 
of sanpling units, 

^ Sanpling Frame ' 
When "selecting a sairple", one is in fact selecting sanpling units 



from the aggregation that^ooiT^ the populatiqi. For a unit to be ' 
selected, it must be identifiable. A list that uniquely identifies all^^ 
of the units in a finite population is termed a sanpling frame. . A sampling 
frane for statewide assessment might consist of a list of all schools in 
^he state that-^iir^ one through six, or a list of all 

secondary studaits enrolled full tine in vocational education programs. ^ 
When assemblihg a sanpling frane, care must be taken to engiire that 
it corresponds precisely to the population of interest. In the first 
example above, a sanpling franc: that consists of .ill schools in the state 

that enroll pupils in grades one through six would be composed of non- 

... -.J 

public schools as well as public schorls. If the population of interest 
consisted only of public^ eleimntary sci;cxDls, this sanplino frane would-be 

t 

inappropriate. First, non-public schools would be listed in trio frane although 



-6- 



they aie not elenents of jthe ^pulation of iiiterest. The errx5neoa^ listing 

* ■ ■ - , • ■ . 

of elements outside the ix:)piLLation of interest is kncwn as "ow;rTegistratiQn?!^ i 



o 
«< ' 



Second, the definition of an "elemsnt-ary school" differs fram state to state. 

r '■• v\ ^ ' •" . 

• V •• In some states, a school is clas_sified„as..aa_ekaT3enliflJC^ — 
-r-. — ' v. 

* • pi4Jils in any grade bdbv-een kindergarten and grade ^ix. In other states-, an j 
elenentary sdxjol is' defined as a school that om.r)lls pipils in any ^ade •. 
. between kindergarten "and grade eight. In states with, the latter definition, i / 
■ there irav. bo schools that enroll only seventh and eighth-graders, that would 
be elenents of a population of elen^tary schools. Ypt these schools would " j 
be excluded frbm a sanpling frane that listed schools -with pt?)ils in'^a^ j , 
one through six. In this case, elements~of the population of. interejit -{all j '. 

public elenientaiY schools) would be excluc3ed from the sanpl^g frame (all ; « 

** f ' 

schools that enroll pupils in grades one through six) . This type of error / 

\' ■ ■ - : V- 

— — in cons tructing a,-£3mpllng.' franc is known, as "underregistration" . j 

. Thg point to be made is that i populations of interest in statewide asse^-?. 



nerTt should clearly and procisfjly defined. Then sanpling frames that iflv- 
clude ally elerrents in the nopulations of interest, and all el^nents'ln the 
populations of interest, should carefully coristructed. „ 
Probability Sanpling Procedures 

When Scunpling is used in statewide assessnents, the financial objecti^teiB 
are clear. The desire is to save money and time by mpasurinq or testing c»ily 
a sample of students, yi£ be aiAc bo mohe accurate statements about a popula- 
tion of students. Probability sanpl>rtg procedures often allow these ab:)ectives 
to bo achievcdr and in addition, allow one to determine the li/celihood of 
l^vc^ inaccurate statements about a population. 



• 



' l*rbbability sampling 'procedures have three ,character:^Stlcs in csoninDn. 
fliMt,' .the procedures are applied to populations where the units y?hich . 
HSdiibse the population- and the units whidi aire excluded frojm t3ae populatdon.. - 
ar© explicitly defined. That is/ given a potential' saiiE|?ling unit, one dan 
isay unequivocally whether it .is in the populktion :Or not. Second, the 
l,.diahces (or probabilityf of selecting* any potential sanple can be specified. 
> "*Ritrd, every sampling unil; in the popoLatibn has a positive qhanoe of being \ 
selected. • It isn't necessaiv that evejy potential sample have an eqimJ. chance 
of being selected, just that the chance of selecting any potential san^M ' 
, can be specified. ^ . • * ^ ^ 

The fonnal definition of a probability saitpling procedure might appear * * 
^done^at formidable, and perhaps- unailightening as well. Sometimes ev^ 
simple things .are obscured by formality (a square is a right parallelopiped, i 
coi^josed of four pairwise orthogonal line segments...). Ihstead .of pursuing 

the deflr^it ion further, consider sane sampling methods that are not probabilii^' 

" ' • i , . ■ ■•■ 

sampling procedures.* Assurne that an assessment objective is to detsmine tlie . 

average social studie$ adhdevement of eighth-graders in eadi school district 

■ . ~' ■ * . ' .' ^ • 

' in the state. Supposes that a particularly large school district decides to 

test eighth-graders in half its schools and uBG--their^^verage' achievement, 
as an estinate of the* average for all eighth-graders. SuppOse^ey decide 
» to select for .testing, thoso schools tJiat are closest to the districtr?^ 
research office. With this plan, they'll select the school closest to 
the research of^^^e^first, the second closest school sc-xjnd, Oiid so on, 
~. until half the «5choolr, in tfr, district have beeJi "sampled". This isn't 




/ 



a prc3lkibi.Uty sainpling procfidure, bejjiause it violates the third character- 
istic of such procedures. All schools with eighth-graders that are 
farthest from the dis'tritrt research office are contained in the sampling 
frame, but .they don't have any chance Tizero probability) of being selects^. 
This same violation would .occur with any sampling procedure that selects 

schools only from a prescribed section of the district. *' '\ 

I ■ ' ■ / ■ \ 

I • • • . . - ■ . • ■■ \i 1 i 

.' These soling procedures cause problems not because they violate an • 
arbitrary rale, but because they are likely .to produce samples that don't 
' represent the population. The district research office is probably in the 
older or downtown area of the system. Schools near it are more likely to 
enroii students fran lower socio-econcmic status families than in I4ie disfefficyb 
as d. vjhole, and the achievement of these students is therefore likely to be 
- lower than jLn.the district as a whole. So again, the rules are not just 
. Statistical ^rtiraots. They help to prevent trouble in the practical world 

of assessiWt. . 

Estimatey/Population Parameter, and Estimator 

In addition to providing procedures tat collecting data, sanpling^ 
theory provides fomulas for estimating characteristics of populations, sudh 

» ' 



as averages, proportions, and totals, l^en a sanple is drawn from a1 . 

■ • . r~ ' . '•- - 

tion, and a statistic, (such as an average) is^ coirputed from data on the 

units san?)led, the number that results is called an esUmate. Jor exanple, ' 

if it is/ound that a sample of ten students selected from a population M 

200 has an jverage arithmetic score of 42, the number 42 estimate o£>. - .. 

of the average for the entire populaUon of 200. The average for the entire 



peculation ,would be an exanple of a population par'ameter. In" general, j^pula- 
tibn parameters are unknown characteristics of populations that survey re-^ 
-aeardi©£s-4*»uld--34j<e-to knowr- If every elesnent in a population is measured, 



the value of Che p<|pulatiai parameter can he deterjnined. Instead of ureasoaring 
every population element/ a survey researcher will iteasure only elemenips in 
a sanple and, from these data, caif|3ate an estinate of the .population paraneter. 
Formulas that are used to conpite estimates from sanple data are terme^ 



estimators. 



In a statewide assessment, the average educational level of teachers 

^ 

in the state mi^t be estimated<-by sending a questionnaire to a sanple^of 
teachers, and ccxiputing an average tot the sanpled teachers. i\n average 
ccmpited from the questionnaire responses of the sanple Is an estimate, ard 
af^> formula used to corpute the average for the sanple of teachers is an 
estimator. . 
Estimator Bias . 
'Wheh a population is finite, the nuntoer of different sanples that can 
be drawn from it is also finite. A list can be made for any finite poppla- 
tion,' containing all of the sanples of a given size that could possibly 

— ~T 

be drawn from it. For exanple, suppose that a school district has four 

hl^ schools and an assessment director wants to sanple two of the follr. 

If the schools are numbered from one to four, the six different samples 

ot two schools that could be drawn arenas follows: 

Sanple Schools in Sanple , 

A 1, 2 

B f, 3 

C 1, 4 ' ' 

D 2, 3 

I * E • . 2, 4 

F 3, 4 



_ ^ e-lO- 

a^p^e"the~assesa^ to knew the average number of 

icertified science teachers per high school in the district, and dedides to; 
estimate the average by collecting data in two of the four schools. In . I 
this example, the population parameter is the actual averse per school ' 
for the four schools in the district. Data^rom-eadv-^apple would provi^^ 
an estinete of this population parameter, and since six different samplesj 
oould be selected, six different estimates are possible. 

Continuing the example,. si;?3pose that an estimate of the population | 

average per school was actually calculated using data from each sample, | 

■ .... . i 

and the six estinates were then tabulated. It would then be possible 

calculate the average of these six estj^tes. If the Average value of 
estinates was equal to the population average, the estimator (formula ijaed 
to calculate each estimate) would be an- unbiased estimator. Ifi an ^ 

the other hand, the' average of the sample Estimates was either larger br 
smaller than' cne. population average, the "^stimator would be biased . 
' . In general, an estimator is said to -be biased if the average of l±e 
estinates it would praduce (if the averse were to be taken over all /'pos- 
sible samples -of a given size) were either larger or smaller than tJ^e pop- 
ulation paraneter. If the average of' all estimates were to equal the 
population parameter, the estiira tor 'Would be terned laibiased. " 

It should be intuitively clear that unbiased estimators are desirable. 

. ■ f 

An 'assessnent director would be happiest if every estimate comput^ from a 

sample was equal to the population paraireter of interest. Si 

^Utop ian condition will hardly ever be true, it is at least nil 




ERIC 



i 



-'-11-. 

the average of the ^stiinates equal the population parama.ter. 

Altliough unbiased estimators are desirable, a biased estimator can 
aorretimes be iaseful if the mgnitude of the bias (the difference between ■ 
tfhe average estinate and the population paraneter) is small. Under soms 
conditions likely to be encountered in a statewide assessment an unbiased 
estiinator nay actually be rejected in favor of a biased one. _^ 



At this point, the reader my wonder how estiirator bias caii be ocsinputed 
using data from a single sample. The answer is, that it can't: be oonputed 
from sample data. To compute bias, one would havQ to kncpw the value of the 
population parameter. If the population parameter were known, there would 
be no reason to sample. 

The bias (or lack of bias) of a sampling and j|!stijiation procedura is 
-actually determined from the estimator used (a mathematicali formula j , and-, 
~Qie nethonatical assumptions that underlie the sanpling procedure. Deter- 
mination of bias is an algebraic procedure that doesn't depend uporf data- 

... - - ■■■■ , ■ \ 

at all (Murthy, 1967r Cochran, 1963). 

UVmnrrAL ^/aMF'LF; ^ suppose that the a'vemge numbev of 

aeptified saienoe ieaohers psr* school u-'ae kncfi^n to be 

equal to 7>.S for the four acjhpols in the distrioti and 

the e3timto3 oonputed fof the s^ix poeeible eamples were 

as follows: 

' 'j-EHllL ^nhf^ols in Ha'^plc Estimate 
A ' ■ 1, r 

' ■ U ' 



ERIC 



Thi^ average of the six eatimdtea uioutd equal 

I 

01 1 

_ •? CO 

^ — Ct iiu , . ... » 

'/■^e estimator used would then B'&siighthj biased, since 
the true value of the, .population para^neter is- 3.50, and 
the average of the estimates produced by all possible 
samples of size t0d'i8 3. 52: ffre^Tmgrdtude of the-hin 
equal ..to the difference l^etween the population parameter 
value, and ihe average of the six estimates: 3.50-3.52 « 
-0.02. 



*In this numerical example and in those that follow, 
hypothetical data arc used. It is critically important f 
to recognize that these examples have been constructed 
solehi to illustrate the definitions of sampling concepts 
presented in the main bodij of the paper. ^ Each example 
assmes a situation that is totally fictitious, and un- 
like the situations that will be encountered in practice . _ 
'Namely, it is always assumed that the values of popula- 
tion parameters are knaJn, and that estimates are avatl- 
<■ a}>le for all of the 'sample's that could, possibly be select- 
ed. ■ , J 

In a practical sampling situation, populatton para- 
meters will no±be known. Yrf they were known, sampHng 
u)ould he unnecessary). Additionally, only one sample ^ 
will be selected, and only one estimate of the populatton 
parameter will be computed. The variance of the smple 
estimaie (see ths fo\lowing section of the text) will .. 
not he directly computable from the data provided by a 
.sinahj sample.' Ho'>ever, ■ the variance of the sample 
esti'nate can almost always be estimated from the data 
provided by a single sample, and this estimate will al- 
>""t]t always he ao^utcd in practice. . ' 

Variance, Mean Square" Error and Efficiency 

When an estimate of a population parameter is oomputed, it will rarejy 
be ecjual to the population parameter. The difference between the estimate 
dnd the populaaon parameter is known as an error of estimaUon. In the 



nxjTBrical example of the last section, the average nuntoer of certified 
science teachers per school was assumed to oe equal to 3.5 for the four 
schools in the district, and the e^'timate oonmted fram Sanple F was 
assuned to be 3.9. With these asstkptions, " the error of estimate would 
be (3.5) - (S.'g) or -0^4. 

. If an estimtor is unbiased, its variance is equal to the average 
of the squared errors of estimte, v^en the average is oaiputed over all 
possible sanples of a given size. Suppose that the estimtor in the 
exarilple of the last section- had been unbiased. Then applying this for- 
mula for variance, the. eirror of estimation would be corputed for each 
of the six sanple^estiraates, each ofc-these would be squared, and the 
average of the six squared errors would requal the variance. 

For a given sampling procedure and sanples of a given size, the\^nDst 
desirable uribiasedlestimatoJ: is the one with the smallest variance. The._^ 
smaller the vari^i of an unbiased estijrator, the smaller the chance that 
a large estimation error can occur. 

When an estimator is biased, its variance is also defined as th-a 
average of squares of differaices. But instead of squaring the difference 
between each estimte and the populc.tion parameter, the variance of a. 
biased estimtor requires that the difference between each estimte and 
the average of all estimtes be squared. The average of the squares of y 
these differences is taken over all potential samples of a given size, 
W!MEHICi\L EXAf'fPLE. ConrAdef onne arjain the hmothetiaal 
data ppci'ented in the lar>t nmevioal example. Tn that 
eX'TrrpU, the ^^^rrajr n'H rr of certified Bcip.ncc tenahevc 
T"'r r.ahnrl uiac aasnn-nd t',^^qn<xl ^.S^ -n a r.ohpot district 



iHth four sckooU. All poaaible samples of tuo schools 
w^re identified^ and estimates of the average number of 
certified soiertoe teachers per school were assumed to 
be as follows: 

. Sample Estimate 

A ■ 4.Z 

B 3.2 

, C ■ 2.fl 

D 3.7 

E 3.2 

F 3.9 
' the average of these estimates was found to equal 3.52. 

These data my now be used to compute the variance of the 



estmai 




9 

Difference Between 


Square of 


Sample 


Estimate 


Estimate and Average 


Difference 


A 


4.3 


4.3-3.52 = 0.78 


0.6084 


B 


O . u 


3.2-3.52 ='0.3? 


0.1024 


■C 


2.8 


2. 8-:. 52 ='0.72 


G.5184 


D 


3,7 


d. 7-3. 52 = 0.18 
L2-3.52 ='0.32 


0.0324 


E 


3.2 


0. 1024 


F 


3.9r 


3.9-3.52 = 0.38 


0.1444 — 



i Sum of Squares: ' 1.5084 

Variance of Estimator » (1.5084)/ (6) = 0.2514 



The definitions of ^variance for biased estimtors and unbiased estijna 
tors are illustrated by Figures lA and IB, belav. Each figure shews a 
distribution of estimate^ across all pDtential samples frcm a population. 



o 



-15- 



KST copy mmii 



difference 
used in 
calculatiiy fr 
variance 



• Value of a 
particular 

sample estimate 




Average of ' Value of 
sample pop\ilation 
estimates parameter 



Size of 
Estinete 



Figure lA: Distribution of estimaties for a biased estimtor 



difference 
used in ] 
^^calculatincL^ 



variance 



Value of Value of a 
population particular 
parameter and sanple 
average of estimate 
sample estimates 



Size of 



FigiiTG IB: Distribution of estimates for an unbiased estimator 



/ 



In Figure 1ft, the average of all estimates and the population par; 
have different values, and the difference between them is ajual to the 
bias Of the estimator. In Figure IB, the av^age of all estimates and 
the population parameter 'have the same value, since the estimator is .. 
unbiased. 

If an assessment director has a choice of using t>7o unbiased estimators, 
th)e one with the smallest variance should be selected. But what if the 
choice is between a biased estimator and an unbiased estimator? The biased 
est^jnator may have the smallest variance but its bias may be la^ge, and 
the proper choice is unclear. The assessment director needs some way of 
ccnparing the nagnitude of estimation errors of biased and unbiase<3 estlmtars. 
A useful neasure for this purpose is called tfhe mean square etor. Mean 
square error equals the sum of the estimator variance and the square of the~ , 
estimator bias, 

2 

Ntean square error = Variance + (Bias) . 

t 

i : 

« ; * 

'■1 * 

'J'JMSRTCAL F.y.AMPlF: Uatna-the data of the previous rmer- 
loil cxarrplca iyi the formula for the mean Gquarc error^ 

# 

'hi thi.n n'M^err^.^a'. c^vr^dc^ the neat: cquarc error oj 
the ectir].:tor •'•'^ a^e^r'i^ ic^v nateJ. hu the mmayiee. ' Al.^ 
th'^Ki'ih ihj- tin-it'^p I^'arr!^ the 'nannitwie of the bias 
:c -^cr'^ .vva^.t^ r?)- / 7^ ryrni^-^-^^l^-iter a>i uira^viificant anoimt 



-17- 

For an unbiased estimator, the moan square eirror and liie variance are .equals 
since. the bias is zero. 

For a given sanple size^ an estimator that has a smaller n¥3an square 
error than another is said to fcie more efficient . iFOr a given sanpling 
procedure, the nost efficioit estiuiator should always be used, since it vdll 
provide the smallest estination errors, on the average. When different 
sampling procedures are used, a less efficient estimator may be preferr^ 
if ii:s sanpling procedure is less 90Stly or wore convenient. In the 
pracrlcalN world of statewide ass^ment, it may be worthwhile to take a 
larg^ sanple if the sanpiing procedure that can be used is more admin- 
istratively convenient or less expensive to complete. 

Consistency * 

Sone amount of error in the estimation of ^populatica. parameters. from 
sample data is almDSt inevitable. However, the magnitude of e3-rors likely- 
to occur can often be controlledi Viith some sampling and estimation pro- 
cedures, the mean square error value can be reduced by drawing 
larger and larger sanples, and estimtion error is reduced to zero whoi 
the sample size equals the populatdon size. Such procedures are said to 
provide \r>nsistent estimation. A sanpling and estimation pixcedure is 
said to be inconsistent if sanpling errors can occur even when the sanple 
size equals the population siz^ . 

When lack of consistency is encountered in practice, the* sanpling is 
usually being-done 'Vtthrrepiacemsnt". In a "with replacement" procedure, 
an element of a population can anter the same sanple more than once. Alr_ 
thou^ lack of consistency can occur when. elements are sanpled without re- 
placement (once an element is sampled it is removed from the population) 
it is not encountered in practical problems. 



ERIC 



-18- 

As an exgniple of a "with replacement" sampling procedure, coaisider 
the case discussed in conjunction wit±i.egtijinator bias, afcove. In that 
exanple, two schools were sairpled from a population of four schoo|.s. If 
sampling were to be done with replacement, ten different samples of two 
schools could be drawn. In addition to the, six samples listed in the . 
previous example, the following are pcssibilities:. 

# 

Sample , Schools in Sample 

. G ; 1,1-' 

H . 2, ,2 

' ' -l • • 

More to tJ^ point, one could select many, different samples of four 
schools, such.,4s: » . * ' . ■ 



Sample 


Sc3iools in Sample 


A 


1', 2 1 3, 3-^ 


1 

B 


1,. 1, 2,. 3 


c • 


1, 1, 3," 4 


D 


1, 1, 1, 1 


E 


• 3, 4, 4, 4 



Unless the nxmber of certified science teachers was the same in all 
schools, each of these samples vjuld provide a different estimate of the 
average numfcer of science teachers per school. As a result, sampling 
errors could occur even though the sample size and the 'population size 
were the samic. 



Lack of cxjnsistency becomes a problem of real concem in tMo situaticsns. 
FJijpt, when the mean square error of an estimator is not reduced in size in 
some orderly way, as the sanple size is made larger and larger. Second, 
yien the size of the sanple necessary to achieve an acceptable mean square 
error is close to the size of the population. Several sanpling and estim- 
atigp procedures liiat are otherwise- attractive for statewide assessment may 
produce tiiese problems in some situations. These procedures, and the 
potentially problematic ccanditions, are described in the next part of this 
paper. 

NUMERICAL EXAMPLE:. Consider once again the Hypothetical 
situattoH desoHbed in previous nymerical examples^ but 
\ suppose that a \nth replacement^^ sarr^pling proeedupe is. 
uced, Aeaume that all samples ^ of 'size one^ tuo^ thvee^ 
and four schools are. iwlected^ and the mean square error 
of the estimator is computed for each sample size. Sup-- 

m 

pose that the results arc as fotloWs: 

r>a)nple Size ^ Mean Square Error ^ — - 

1 h25 
. S 0. 64 

Z 0.88 

0.22 

This exrjnple ill rates tt.'fo kinds of inconsistency. 
J^irst^ the nean cquarc f'rror docs not become progrc-isively 
smaller ac fhe sarTf'Lc size. iK 'reascd; the mean square 
error for soprpU'^n nf thrrr n -'h ')olc in Larger thaK the *iea>i 



square error for scunplca of tuo sohoolo. Seaonda the ^ 
mean square ' error '.s lav<jer than zero for oamplea of . " 
• four sahoolc, ' eijcn thounh there are only four schools , 

in the population. * * . . <■ 

s * ' / . ** * ^ 

Oleavlu. the first kind of inaons is tenay is in- 

_ ■ ' ■ <i 

tolerable. A- sampling vesedrcher never knous "how large 
the mean square error will le^, although it oqn he vett- 
mated for many sampling jifooednres . IMleas estimates 
are made for every 'possible .^arvple size (u^hiah is sdme- 

« • • • f * 

_„ __ — -tirfes irrrpt^ssible)., the researched acoaH d<*termine qjfi ■ 

appropri.ate sample size with any iegree of corifidenaei 

a large sample may be less efficient than a small sample . ■■ ■ 

'* . • . 

• Using Sairplin^ tn Statewide Assessment ' 

Whether sanpling is useful for statewide assessment depends primarily . 
on the cfojectives of the assessmoit, an^, secondarily on tihe capabilities of 
tAose oonductijjg the assessment. For sore assessment purposes, usually v*i6n 
assessment results are desired for individual students, sanpling will not be 
useful at all. For other purposes, as when assessment results are desired - 
for individual classroans, sanpling may be feasible but impractical. But 
for many assessnent purposes, s?jrpling will not only be feasible, but a 
practical route to saving time, dollars? and effort. 

I'he capabilities of the agcaic^' conducting the assessment have been 
deored secondary when considering the usefulness of sampling, since con- 
siderable help—through consultants or outside agencies—is likely to be 



readily available. Further, the exists of such assistance are Idxely to be 
nore than repaid throu^ the savings afforded by sanpling. \, 

Sone sanpling procedures are both feasible and practical for some 
assessnent purposes, but infeasible or iirpractical for other^. For 

exanple, simple rand^n^s^ipling (which is discussed below) raay-be-ijn — -— 

practical for dete3:mijiing idie average aqhievenent of pipils in a par^c- 
ular grade throu^out a state (the inpractidality stene from th^ need 
for a single, list of all pipils enrolled thrau^out the state) , hut 
practical ^d feasible for detemLning-^die average achieveraesrit of pupils • 

• ^ 

in a particular gr^ide in eadi school in the state • In the latter dase^ 
separate sinple random sanples mi^t be selected from each school, lasifig 
readily-^available lists in each school district. 

To this point r thia^per has been concerned with the language of 
sanpling^-basic terms and concepts necessary to an understanding of sam- 
pling and sanplers. Vfe shall new change course by considering two practical 
assessnr*".^ objectives gleaned from actual state asses^nent reports, and 
describing hew sanpling procedures could be used in acm^gj^ng^J^ep^ 
objectives. ' / ^ 

Objective 1; Determining the Average Reading Achievement of all Fifths 
Graders in the State . 

An obvious way of deteniiining the average reading achievement of all 
fifth-grade pupils in a st^te is to test them all, record their soore^v and 
corrpute the ^'erage. This procedure, knavn as. taking a census of fifi^- 
•graders, was actually followed in the state that reported ^-his dojectivQ.- 



^ I -22- 
For roany objectives, and particularly when estinating statewide averages, 

taking a census isjwastefu^ J 

Simple Random Sangtling •» 
One pcocedu3:e that "could be used to achieve Objective 1 is -called 
' sinple random sanpling; a procedure in which every potential sample ha^ 
an equal chance of being selected. Merely computing the arithnetic 
average of data from a simple random sainple will provide an estimate of 
-fehe-^pulatien -average. - -©tis - s^li3if--and~€9fciiRati0n--^|3ed^^e-is^ 



biased and consistent, and there are well-known fomulas/for estimating \ 
the nean scruare error of the sample average (Hansen, Hurwitz and Madcw, 

1953). - 

To estimate the average reading achievement of fifth-graders in a 
state through simple random sampling, the procedure" would be as follows. 
First, a sampling frame would be consiirosted by listing each fifti-grader 
enrolled in the. state, and assigning la ^i*^? number to each listed pupil. 
Ihe sanpling frame would include all enrolled fifth-graders or only fifth- 
graders enrolled in public schools, dfepending on the population of interest, 
Once the sairpling frame was constructed, a table of random nuirtoers would 
be used to select a sanple of the desired size. A nunber would be drawn 
from the random nunber table r and the pupil with the corresponding nuntoer 
would be added to the sanple. If a n^imbcr drawn from the table either 
exceeded the largf^st number on the Ti'st of pupils, or repeated a nunber 
already drawn, it would be discarded. Selection of randan numbers from 



-23- 

the table and c»rresponding pupils fran the^ list would continue, until 
the desired sample size was reached. 

A practical prctolem that w4 have skirted so far will..ari^e tiime and time' 
again in sampling. Just what is the "desired sanple size" and how can it. ^ 
be determined? With :'siniple random sampling, the desired sample size can be 
conputed through strai(^tfo£ward application of al, formula given by Hansen, \- . 
HuEWitz and ^4adaw (1953) ,i Cochran (1963) or in many other books on sampling, 
pjtf-hor' h\}^r\ g^-ai-ing f-h&_£Qi3nuJ^ he3;e^»-j«jamll-433naider same-Qf-JJae-f ac fcora — . 



tot enter into it. First, of kll,, the s^ze of a sample that's required to 
estinete a population paraneter depends on the magnitude of the estimation 
errors tiiat can be tolerated, ihe entire population must be sampled if the 
. pararoster must be known exactly. If -a sanple is taken, there will almost 
always be sane estimation error, and for soite samples the error may be very 
large. Since sinple random sampling is consistent, the variance of 
estimation errors can be reduced, by increasing the sample size. 

Three factors enter the sample -^ize formula for simple random sampling*' 
the size of the population, the variance of the variable that is to be esti- 
mated, and the size of the estimation error that can be tolerated. Seme _ 
rules of thumb for these factors, are as follows: The larger the population 
size, the smaller the percentage* that must be sampled in order to realize 
an estimator variance of a given size. For example, with a population of 
100 pupils it might be necessary to sample 50 peafcent (or 50 out of 100) , 
but with a population of 10,000 pupils it might only be necessary to sample 
one percent (or 100 out of 10,000) to realize a given estimator variance. 



\ -24- 
Ttie larger the variance of the variable for which a parameter is to be 
estimated, the larger the sample sizje required to adiieve a given estima^ 
tor variance. Ihi^ i; Jituitively reasonable. If th^ variable (for 
Cfojective 1, reading achievenient) has a large variance, estiniates will 
fluctuate greatly from sanple to san^le; a larger sanple size will be 
required to reduce its average fluctuations. Finally, th^ smaller the 
estimation era-or that can be tolerai^edr-the larger will be the required, 
s iaei — Againr th^is rule-is ifiteHr feivQly reasonable » ^ 



Should sinple random sampling really be used to achieve dDjective 1? 
Prd3ably not, for the following reasons. First, there dre other, more- 
efficient sanpling nethods that can be used. Second, it would be admin- 
istratively cunbersome to a«5e sinple random sanpling. •' ;As previously 



mentioned, the assessment director would need a conplete list of all . 
fifth-graders enralled in the state. While such a list could probably 
be compiled in mDst states, its preexistenoe is doubtful, and its'compi- 
lation would be expensive. When sairpled fifth-graders were actually 
tested, sone classes of 25 would have 20 tested pupils, soire would^^ve 
only one or two tested pupils, and some Wjuld have none at all. ^Sting 
only soiTE of tiie pupils in a classroom is administratively cunbersorae, and 
probably should be avoided unless the nuntxjr of pupils drawn from each 
classroom is very small. 

Simple- random saitpling is almost always discussed in san^)ling texts 
because it is a strai^t forward procedure, and can be used to illastrate 
important sanpling properties. It also provides a benchmark against 



-25- 

vMch the efficiency of nore sophisticated sanpling procedures can be 

aanpaied. For statewide assessnimt the practicality of simple random 

sampling is limited, althou^ it may be useful when the dbjective is to 
estimate some property of schools or school districts. 
Stratified Random Sanpling 

An alternative to sinple random sampling that could be used to 
adiieve Cbjective 1 is stratified random sanplingV Stratified random 
■saSpIihg'is goier^^ iore ef f icioi^^ sampling,, Because- - 

it takes advantage of facts thait are known about the elements of a popula- 
tion. Stratified random sanpling can be contrasted with sinple random 
sanpLtng by considering a specific exanple. Suppose that the size of a ■ 
sinple randan sanpXe necessary to ^estjimate the average reading achievemant 
of a state's fifth-graders was found to be 200. Following th^ proceduijB 

for selecting a simple random sample, it is possible that the 200 pupils ' 

♦ '• • - , 

selected migh have an achievemsnt average that was far hii^er than th6 
average for all fifth-graders in the state. This would almost surely be 
the case if most of the pupils in the sample had verbal IQ scores that 
_vrere, say, above 130. Suppose it was possible to guard against samples 
that had almost all high-IQ pupils, by ensuring that any sanple selected 
would have afliie low-IQ pupils, some mid-IQ pupils and some hi^-IQ pupils , 
with percentages of eadi similar to the percentages for the v^tiole state. 
Sanples of pupils that cane close to representing the state's fifth- 
graders on verbal IQ would probably do a good job of representing them 
on reading achievement. Tttiis is true because verbal IQ-soore and read- 



ing achievenent are highly related; those with hi^ verbal IQ-soores 

c 

are likely to have high reading _achie^ement scores, and those with low verbal 
IQ-scores are likely to have lew reading achievenient scores. Use of known 
rolationships anong variables and available data on sanpling units is 
what makes stratified' sanpling efficient. Stratified'saitpling prevenfe 
the selection of ejctrenely unrepresentative sanples (such as all high- 
IQ^'^i^ils) , and thereby prevents large estimation errors. To achieve an 
estiit^tor variance of a ^ven size, stratified sampling will therefore ^ 
require a smaller sanple size than will sinple random sanpling. 




stratified random sanpling, elements of the population are first 



classified into categories called strata , according to their values on one 
or more stratification variables . In the previous exanple, verbal IQ 
played the role of a stratification variable. Any variable for which a 

value is known -for every element of the population can be used as a 

stratification variable. However, stratified sanpling, won't be efficient 
unless the stratification .variable and the variable for which estimates 
are desired (reading achievenient in the previous exanple) are highly re- 
lated. 

Considering the previous exanple more explicitly, suppose that vej±>al 
IQ was to be used as a stratification variable, and the parameter to be 
estimated was the average reading achievempnt of all fifth-graders in a 
state. The first step in using stratified rr dan sanpling would, be to 
defiiie appropriate strata. For exanple, IcM-lQ pupils might be defined 
as those with verbal IQ-soores Ixilow 85, mid-IQ pupils might be defined 



as those with verbal IQ-soores between '86 and 115, and high-IQ pupils as 
those with vdrbal IQ-scores of 116 or imore. These IQ intervals would 
define three strata, an^nii(^t be labeled stratum 1, stratum 2 anri stratum 
3." Once the strata were defined, each fifth-grader in the state would be 
classified a^ a member of stratum 1, 2 or 3 depending, on his (her) verbal 
IQ-scor^. When all fifth-gra'^rs in t±ie state had been assigned to strata, 
a sirrtple randan sanple of pupils would be drawn from eacji stratum. Hie 
average reading achievement of pipils sarrpled from each stratum would then 

be calculated, and these averages woulc3 be weighted appropriately to form 

. . ' . •■ ■ ■ I 

an estimate of the average achievement of fifth-graders throughout the state. 

The estimator would be both unbiased and consistent. 

For estimating a statewide average, stratified randan sanpling has the 
sane disadvantages as siirple randcan sanpling. It requires a saitpling frame 
that lists all fifth-graders 'in the state. In addition, it mi<^t result in 

selection of a few pupils from sane classes and many pupils from others. 

It thus has the potential of being administratively disruptive in some 
^dhools and districts. \ 

The main advantage^ of stratified random sanpling is its efficiency 
(vs*ien the ri^t stratification variables are used). In addition ^ v^en 
stratified sanpling is used in statewide assessment or in other educa- 
tional data-collection programs^ the information needed for stratifica^ 
tion is generally available. During the last decade at least, group IQ 
testing has been almost universal f and nearly all sdiool districts admin- 
ister standardized achievement tests (Ctoslin, 1967). In addition, school 
systems record all nanner of infornBtion on their pupils such as parental 



occupations, educational levels of parents, and sizes of pupils' families. ^ 
All of these .^variables tend to be hi(^ly related to current educational 
achievenent (Nbllenkopf and Nfelville, 1956; Burkhead, 1967), and if 
availab3.e, would be qmte useful as Gratification variables in stated- 
wide assessments. ^ . 

In theory^ strata can be defined by any nuirber of variables. One coulc^r 
^fof exanple, stratify pupils by IQ-soore and status- level of father's oc- 
cupation. The strata thus forned might be labeled Icw-IQ and low-status 
occupation, Icw-IQ and mid-status occupation, Icw-IQ and high-status oc- 
cupation, » mid- IQ and low-status occupation, etc. Stratification by two 
or more variables is only efficient v\*ien each stratification variable is 
highly related to the variable for which estimates are sou^t, and v*ien 
the stratification variables a.e not highly related among thenrtselves. The 
previous exanple, stratification of pupils by IQ- level and by status level 
of father's occupation, would probably be an unnecessarily cunfoersone pro- 
cedure. Althou^ reading achievemant is highly related to b6th IQ- level 
and status- level of father's occupation, the two^ stratification variables 
are themselves hi^ly related. Pupils from hi<^-status homes tend to have 
higher IQ levels, and vice versa. Stratifying pupils by these two variables 
^is therefore redundant; stratificatiai by eitlicr variable would be almost 
as efficient as stratification by both; althoui^ IQ-level would {jrabably be 
aJ^etter stratification variable than would father's occupation. 

Practical use of stratified sanpling requires several design decisions 
in addition to those already discussed. Once stratification variables 
have been chosen, the sairple designer must decide how many strata to use) 



the limits or boundaries for edch stratum if. g., IQ belcw 90, IQ^ between 

91-110, etc.), tlie size of the sample to ^eJ.GCt, and the nunber of units 

1 • 

to saitple from each stratum. Eac3i of these topics has been the subject 
of theoretical and empiricai study in the theory of sampling. Again, 
gene pmi^cal factors that infli^noe the decisions will be described, Ihe 
choice^ nunber of strata depends on the magnitsude of the relationship 
be t ween tho str atifieafeien-^vari^l^ ttnd the variable for which estimates 
are sou<^t. The stranger the relationship, the larger the number of strata 
that will prove useful/ although practical limits are reached very quickly. 
EvenJii*ien the stratification Variable and the variable of interest have a 
correlation coefficient of 0.90, there is not much advantage to using raorfe 
than four strata (Cochran, 1963). The problem of determining boundaries 
for strata so as to make stratified sarnpling as efficient as possible has 
been given considerable attention by Dalenius and Hodges (1959) . They 
provide formulas that can be used in practice, but defy sinple, intuitive 
ej^lartation . Explicit formulas also exist for determining the sairple ■ 
size to use in stratified sanplin^. As in sinple random sanpling, re- 
quired sanple si2e depends on the population size and the size of the 
estination errors one can tolerate. Unlike sinple random sampling, the 
sanple size for stratified sanpling also depends on how well the popula- 
tion has been stratified. The object of stratification is to form categories^ 
within which sanpling units are as nc^arly alike as possible on the variable of 
interest. The irore nearly this has been acconplished, the smaller will be 
the sanple size required to achieve a given estimator variance. Determina- 
tion of the number of units to bo sampled from each stratum is aejierally 



handlvid in cane of two ways. Using a procedure termsd optimal allocation » ' 
a specific formula indicate^ the sanple size for each stratum. The 
advantage of this procedure is that it nokes a given stratified sanpling 
proc3Bdure as efficient as possible (hence the tenn optinal) • An al- 
ternative procedure is termed proportional allocation . With proportion-- 
al allocation, the size of the sanple selected from each stratum Is pro- 
portional to the nurrtoei/ of population elements in the stratum. Hie ad- 
vantages of paroportional allocation include simplified estimation formulas, 
and assurance that tlie stratified samplincr procedure will be at least as 
efficient as sinple^ random sairpling. 

Systematic Sanpl ing — - 

-The average reading achievement of fifth-graders in a state could 
also be estimated by using a systematic sanpling procedure. Several 
systCTiatic sanpling procedures have been developed in the last t^vo decades, 
but only the one used most widely — linear systematic sanpling — will be 
considered. 

Like sinple random sampling, linear systematic sanpling would reqttire 
a sanpling frane of fifth-grade pupils. Instead of consulting a table of 
random nunbers to determine each sanplod pupil, a random nunber table is 
consulted only once wit±i linear systematic sanpling. The. sanpling frame of 
pupils is considered to be an oi?dered list, ^^e first sampled pupil is 
selected randomly, and successive pipils are selected at multiples of a 
constant interval beyond the first. A specific exanple may help to 
clarify the procedure. 



Suppose it was desired to select a linear systematic sanple consist- 
ing of;) ten percent of the fifth-graders in the population. To determine 
the first sanpled pupil, a nuniDer between one and ten would be drawn from 
a random nuirber table. The pupil with the corr^ponding nuitber on the 

/ 

"sanpling frame would become the first sanpled pupil. Thereafter, every 
tenth pipil would be sairpled. Thus if the random nunber six were drawn 
from the table, the first sanpled pupil would be the one listed sixth in 
the frame, the next sampled pupil would be listed 16th in the frams, ihe 
next 26th, and so on, until the sajipling frame had been esdiausted. 

j NUMERICAL EXAMPLE. Consider the selection of a ten per- ' 

cent susternatic sanple from a population of fifth-gmde 

piqnls. Suppose that a tal>le of randon nmbers had ir^^ " 



« 

consulted to select a fiunl^er between one and ten^ and that 
the nim.ytn^ drok'n uJac nix* - If the sarni'>lin;j fvane were as 
follows y the sarri'lcd rmpilc ixo'dd }>e those mpked with an 
asterisk: 

Pup^l limber r:(ril ihir.c 

1 y:Hrphu ^ fJohyi 

Hrunoy Barb'ira 
7 Avny ^ ('-ivol 



-32- 



Pupil Number 


Pupil Name 


I'S 


TooQOy Brenda 


U 


Mil 1 CO L rriy Thomac ' 


If' 


An^joff^ Douglas 




Foum t ty nharron 


17 


vanl> If Joan 


18 


Win is J Keoin ' 


W 


Picard^ . Ronald 


P.O 


fn.blnjy LindQ 


:u 


Arcicriy Gheryl 




Kris tofy Charles 




Pa 1 1 1? r^a on^ Vi rgi nia 




Johncony Finer 


P.5 


CaxCy Anne 




rtaJily Mildred 


::? 


Walsh y Helen 


28 


AdariSy Patriae* a 



three irtn rn*. jyiifxj *'he L?ontimiatirm of the listy 
\ " 7 th(^ ccl ent^* ir, of p^hTi^ tenth pupil beyond the 26thy 

'{y:*A 'l ^ii.. r>:f. 't^' r.:V':r^h'>ij fr r'te vJ })cen' e.rhaucted. 7^us' 

^.h: rr.nf'Tiy!*: ! ''•40 vHvV.r.y !h' last one selected for 

t»> svrf le i i c p'ipil nur^il^er i/>C. 

Systeinatic sanpling has the advantage that it is easy to apply by . 

handf whereas siirple random sanpling or stratified random sanpling are 

I 

quitei tedious witliout a conputer when a sanple of appreciable size must 
be drawn. When used in an assessment pijogram^ systematic sanpling would 



erJc 



-33- . 

also ensure that the nunber of pupils sampled from each classro^ was ^ 

.. ^^^^ 

approxiinately equal, provided the sanpling frame listed pupils sequential- 
ly by classroom* Like sinple random sanpling tliougti, systematic saI^pling 
would require a list of all f if th-gradors in ;the state. ^ 

Unlike sipple random sampling and stratified saKpling, linear 
systematic sampling is sometimes undependable* It is not always ODn- 

L 

sis tent, and there are no really good ways to estimate mean square eiror.' 
_Oonversely, linear systematic sanpling can be very efficient if the list 
used for sampling' is carefully constructed. If pipils were listed al- 
phabe\;ically in the sanpling franr*, one would suppose. tiiat their average 
achievement mi^t be estiinated about as efficiently as \|ith sinple randcsra 
sanpling. In fact, alphabetic listing of pupils sometimes results in ^ 
more efficient qstimatioii (Jaeger, 1970) , although^ this .>on't always be 
the case. Rsal gains in the efficiency of systematic sampling cart be 
realized by listing pupils in increasing order on some variable that is 
hi^ly related to the variable of interest.^ For exanple, if a linear^ 
systematic sanple of fifth-qraders was selected from a sanpling frame in 
v*iich pupils were listed in increasing order of their verbal IQ^gSqres, * 
average reading achievement could he estin^ted very effficiently.^ The 
effect of such ordered listings is much the saire as the effect of /itrat- 



ification, since sanpling from em ordered list ensures that some pupils 
are sanpled at all levels of the variable used for ordering. 

Linear systeiratic sanpling is one of those prqcedures^ mentioned 
earlier, that ish*'t always consistent and, depending on the relationship 
between the sanple size and tlie {Xjpulation size, may lead to biased osti- 



ERIC 



-34- 

nation. Usually the mgnitude of the estinBjiLQn bias is inconsequential, 

but the lack of cxDnsistency may prove to be ^ serious prdDlem. If saiipling 

must be done without a oonputer and if the r^uired sanpie size is large, 

linear systematic sanpling should be considejired for statewide assessment. 

OtherwisCf alternative san|>Iing procedures {^udi as stratified sanpling) 

# 

will provide mDre dependable results* * , 

^ "* ■ . 

Cluster Sanpling . .. 

ls\ the sanpling procedures discussed to this point, the sanpling 
units *used were basic elements of a population; e. g., individual pipils. ' 
In cluster sanpling , the sanpling units are not basic population elements 
but ai^e groups or aggregations of such elements. These groups of elements 
are tbrmed clusters. / 

^ , „. — - • / 

In most applications of cluster sanpling, the clusters used, ars \ 

i ■ * 

naturally-occurring groups. In surveys of consumer behavior, for exanple, 
homes are frequently used as sanpling units. When estineting the average 
achievement of fifth-graders throu^out a state, several naturally-occurring 
clusters of , pupils might be used — school districts, schools, or hanerooms. 
Of course, these aren't the only possibilities for clusters. Ctie mi^t 
consider groups of students living in particular areas of the state or. 
groups of pupils with last names beginning with liie same letter. However, 
\ naturally-occurring clusters afford far greater administrative convenience 
than would these contrived clusters. Pupils can readily be ideiitified by 
classroom, school or school district, and could easily be assentoled for- 
testing and nrnsuremont on a homeroom-by-honeroom or school-bv-school basis. 



If a cluster sampling procedure is identified by the units used as 
clusters — school districts, schools, homerooms, or combinations of these — 
many' different cluster sampling procedures could be used to gather data 
for Cfoject^e 1. before enumerating some of the possibiliti|^s, let's 
ODnsider one/ in detail, and th'^reby^ introduce some of the language of 
cluster sanpling. 

Suppose it was decided to use schools as clusters, and to test the 
reading achievenent of al] fifth-graders enrolled in sanpled schools. This 
procedure is an exarrple of single-stage cluster sairpling . The Sanpling 
plan irould be carried out by first constructing a sampling frame of all 
schools in the state that enrolled fifth-grade pupils. A sinple random 
sanple of sdiajls could then be selected using a table of random nurrfoers, 
just as In sinple random. sanpling of pupils, described above: All of the 
fifth-grade pupils in sanpled schools would then be given a reading achieve- 
n^t test, and appropriate formulas would h:^ applied to the test results in 
order to estinate average achievement for the state. The formulas to be 

used (estimators) are well )*uiown in the sanpling theory literature, and can 
be found in ary standard text such as Cochran (1963) . 

Thig cluster sanpling procedure has some obvious administrative ad- 
vantages.' First, the state department of education is likeiv to have a 
coiTplete list of schools that enroll fifth-graders, al.thou^ it probably 
doesn't have a list of fifth-graders enrolled in the state. "Thus a ready- 
made sanpling frame is likolv to exist for tliis sanpling proceduiw 
Second, only a saiiple of schools will be involved in testing. DistapLion 



-36- i 

of norma! academic procedures will be oonfinet^ to the sample of schools, 



> 



the costs of distributing testing iraterials will be reduced, and admihistra- 
tive procedures will be sinplified. 

The administra*:ive rjonvenience of sanpling procedure is likely 
to be of.^'set by a substantial reduction in efficiency. In alirost all 
cases, cluster sanpling of schools will far less efficient than sinple 
random, sanpling of pupils. Ihe "almost" is inserted in the previous 
sentence because there are notable exceptions to the rule. The efficiency 
of single-stage cluster sanpling depends on many factors, some of which 
can be controlled by the sanple designer. The conposition of the clusters ' 
used influences efficiency to a laige degree. Two extreme cases will il- 
lustrate this point. To take one extrone, sn.ppose that all of the fifth- 
graders in any given school had the same reading achievement score. In 

this case, testing all the fifth-Naders in a school would be a waste 
of tine and money; the average achievement in a school could be determined 
by testing just one fifth-grader. More to the point, the effective • 
sanple size is equal 1;o the number of schools in v\*iich testing takes 
place, rather than the nunber of pupils tested (since testing more 
than one pupil in a school would provide only redundant information). 
In technical terms, this extrcnri case represents a situation in whic±. 
all of the elemEsnts witliin a cluster are completely homogeneous on the 
variable to be estimated. The other extreniF? would occur in a situation 
where the average reading adiicvemont of fifth-graders in each school 



1% 



was identical, and equalled the average for the v^ole state. In this 



-37- 

\ 

case, the average for the state could be estiirated perfectly by col- 
lecting data in only one school, since testing pipils in more than caie 
sdiool would provide only redundant informtion. In tedinical tenns, 

f 

this extreme represents a sitiaation in which elements within a cluster 
are as heterogeneous as elements within tiie entire population, and 
v*iere clusters are aatpletely homogaieous. In real life, the com- 
position of the population will fall sciinev^ere between these extremes. 
For cluster sanpling to be efficient, we would like the composition of - 
the population to be similar to the second extreme: not much difference 
among clusters on the variable to be estimated, and a lot of heterogeneity 
among elements in the same cluster. With this caiposition, only a few 
clusters need be sampled in order to get a good lamentation of the 
entire population • 

liifortunately, the naturally-ocScurring clusters available for state- 
wide assessments tend to provide homoigeneity within clusters and heteio- 
geneity between clusters for many variables likely to be of interest. 
Consider sanpling of schools to estimate pupil achievement. At least be- 
fore bussing for purposes of desegi?bgation^ the attendance areas of schools 
tended to Ffe defined by neighbor' . x)ds that were relatively homogeneous in 
their socio-economic and racial oonpositions. In a society vy^ere nei^ibor- 
hoods tend to be defined by people of the sams social and economic le\©l, 
it is natural that schools tend to be honogeneous in these variables. Since 
piJ?)ils' scores on achievement tests are hi^ly related to the socio-economic 



/ 

t 

/ - . vs 

/ 

-38- 

i 

yitatus of j their families, schools also tend to be homogeneous in measured 

' academic achievement. 

The coitposition of the population of interest (e. g. , all fifth-graders 
in a state) is a factor beyond the control of the sample designer; whatever 
is found must be tolerated. !fciwever, there are factors that the user of 
cluster sampling can control so as to greatly increase sampling efficiency. 
One such factor is the estimation procedure . employed. When the clusters to 
be sanpled are not only heterogeneous, but also tend to vary greatly in size 
(both are tendencies of schools and school districts) , sijiple random sampling 

- of-clusters with .unbiased estination of averages is very inefficient. A 
nore efficient alternative i-nvolves simple random sampling of clusters and 
use of an estimation procedure known as ratio estimation . Tb use ratio 
estimation, the nuirber of elements in each cluster must be kncwn; a reqidre- 

nent that is easily net in most assessment applications. Ihe ratio estimator 

'\ 

• is biased, but consistent. The amount of bias is likely to be small for 
< ■ / 

popxLations used fn statewide assessmsnts, and the nean square error will 

usually be much smaller than that of the unbiased estimator. Ponnu3.as fpr 
ratio estimation can be found in Murthy ^1967) Cochran (1963) and Hansen, 
Hurwitz and ^4adcw (1953). 

Additional alternatives modify both the sampling procedure and the 
estimation procedure used with single-stage cluster sampling. ^ defini- 
tion, each cluster has an cK:iual chance of being selected when clusters are 
sanpled randcmly. One al^i?<rnativo procedure, kncMi as PPS sampling, selects 
clusters with probabilities proportional to their ^izes. If schools were 



i 



, -39- 

being used as clusters in order to estimate average fifth-grade reading 
achievement, the probability of selecting a given school would depend on 
its fifth-grade enrollment. A sdiool with 200 fifth-graders would be twice 
as likely to enter the sample as would a school with 100 fifth-graders. 
The PPS procedure provides not only a sanpling method but associated eatinEa* 
tors of averages, proportions and variances as well. It is simplest to do 
PPI^^ sanpling "with replacement" since selection probabilities var^ as the 
saiipi^ is drawn, v*ien saix»ling is done without replacement. PPS sairpling 
witn replacement provides unbiased estimation, but is an inconsistent pro- 
cedure. The man square error of the estiinator gets consistently smaller 
as sample size ist increased, but does not go to zero when the sanple size 
igipls the popuiauorvTslzeT IrT practicaT situations , this lack of con- 
sistency will be a problem only when the required sanple size is very close 
to the population size. 

PPS sanpling is efficient only when cluster size is highly related to 
the variable for v^ich estinmtes are desired. Since school size and school 
district size are not hi<^ly related to basic-skills achievemsnt (Burkhead, 
1967) , PPS sanpling will not be efficient for estination of average achieve 
nent in a state. Sone school and district "input" variables (sudh'^as'the 
average value of the taxable property in an attendance area or district) 
are hi^ly related to school or district size, and PPS sanpling would 
probably be very efficient for estimation of these variables. 

A final alternative, PPE5 sampling, is likely to be a very efficient 
way of estimating average achievement iTf\ a state. PPES stands for "£r6B'- 



ability £roportic3nal to expected size" (Cochran, 1963) , a term that is ap- 
prapriate in sone sarnpling contexts but not in the context of statewide 
assessnent. PPES sanpling was first introduced to handle situations in 
which cluster sizes were not knavn exactly. In these casns, "expected 
sizes" rather than actual sizes were used. 

* In assessnent applications, cluster sizes are usually kncwn but are 
oftem nearly unrelated for which estimates are desired. 

_Tlie.jgreater the relationship between the variable for which estimates are 
sought and the "expected size" variable, tlie higher the efficiency of PPES 
sampling. This being true, clusters can be! sampled with probabilities pro- 
portional to any variable that has a kncwn Value for ever^' cluster in the 

ation? the variable used can be totally unrelated to cluster size. 
aonsider\;th'^ case of ODjective 1. Suppose that a group IQ-test had been 
adndrtiste^d^to every fourth-grader in the state in the year preceding the 
"current ia^snent. If the state had records containing thj average IQ of 
fourth-graders for each school and the fourth-gr-ade enrollment of each school, 
the product of these two could bo used very effectively as an "expects size" 
neasure when estimating average fifth-grade reading achievement. Ihis pro- 
cedure would be highly efficient because the average of fourth-grade IQ- 
scores and the average of fifth-grade reading achievement scores would 

be highly related across schools. 

Like PPS sampling, PPES sampling results in unbiaked but inconsistent 

estiiTBtion. Again, inconsistency will be a practical problem only when . 

the required sample size is very close to tho-population size. Additional 

information on PPS sampling and PPES s.nmpling can be found in Murthy (1967) 

and in Cochran (1963). 



-41- 

Instead of using schools as clusters, the average reading adiievement 
of fifth-graders in the state could be estiirated by using either honerooins 
or sdiool districts as clusters. Either of these single-stage cluster, 
sanpling -procedures "would be feasible, provided appropriate sanpling frames 
ODuld be cjonstructed. Undoubtedly, every state department of education 
has a complete listing of school districts that enroll fifth-graders. A 
sanpling frane of homerooms prqbably wouldn't exist in most states thou^, 
and sampling by homerooms would require a specially constructed frame. 
The cost of constructing a samplinn frame of homerocms would probably be 
more than offset by the increased efficiency of a single-stage cluster 
sanpling plan with haterocms as clusters. In most states, cluster sanpling 
of hcaierocfras would be fair mDre efficient than cluster sanpling of schools, 
and cluster sampling of schools would be more efficient than 'bluster sanpling 
of districts. The Increased efficiency is due in part to substantially 
greater size variability among districts than among schools, and anong 
schools than among homerocms. | 

Thus far we have considered only single-stage cliister soling pro- 
cedures. Many multi - stage cluster i sampling procedures could be used to 

\ 1 

estimate the average reading achievement of a state's fifth-graders. Pos- 

sibilities include the following: 1) A random sample of schc^ls could be / 

drawn, and within sampled schools, random samples of homeroom^ could be 

selected. All fifth-graders in sanpled homerooms would be tested'. 2) A 

randan sample of districts could be drawn, and within sampled districts. 



-42- 

random sanples of schools could be selected. All fifth-crraders in sampled 
schools would be tested. 3) A random sartple of districts could be drawn, 
and within sampled districts, a random sanple of homerooms could be selected. 
All fifth-graders within sanpled homerooms would be tested. 4) A random 
sample of districts could be selected, cand within sampled districts, random 
samples of fifth-graders could be selected and tested. 5) A randcm sanple 
of schools could l^e drawn and within sanpled schools, random sanples of 
fifth-graders could be selected and tested. 6) A random sample of fifth- 
grade honerocms could be selected, and within sanpled homerooms, random 
samples of pupils could be drawn and tested, 7) A random sanple of 
districts could be selected, random sanples of schools could be drawn with- 
in sampled districts, and rahdom sanples of honerooms could be selected 
within each sampled school. All fifth-grade pupils within sarpledJieiiie- .- 
rooms would be tested. 8) A random sanple of districts could be selected, 
r'Tiidom sanples of schools could be drawn within sanpled distriqts, random 
sanples of homerooms could be drawn within sanpled sdiools, and random 
sanples of pupils would be selected and tested within sampled homerocms. 
Although these eight procedures do not exhaust the pDssihilities, they 
provide sufficient illustration of the flexibility of cluster sanpling. 

Pi-ocedures 1) through 6) are examples of two-stage cluster sampling. 
In procedure 2) , for example, sampling of districts constitutes the first 
stage (districts arc tenrvjd pr^jmary sampling units or PSU's ) , and, sanpling 
of ;chools is the second stage. Schools would be called secondary sanpling 
units. Procedure 7) is an example of a three-stage cluster sanpling pro- 
cedure, with districts as PSU's, schools as secondary sampling units, and 



-43- 

horerooms as tertiary sanpling units. Procedure 8) is a four-stage 
cluster sanpling procedure . 

Multi-*stage"clU5ter sanpling will often be more statistically ef- 
ficient than single-stage cluster sanpling. That is, the mean square.^ 
error of the estimator will be smlleri ^or a given number of elenentary 
units in the sanple. There are also some administrative advantages to 
multi-stage sanpling. If sanpling frames don't exist, they need only 
be constructed for a sanple of PSU's. For exanple, if a state wanted 
to use homerooms as clusters but didn't have the required sanpling frame, 
it could use two-stage sanpling with districts as PSU's and homeroams as 
secondary sanpling units. The district sanple would be chosen first, and 
sanpling frames of hcanerooms would be needed only for sanpled districts. 

Cluster sanpliri(g[ can also be used in combination with other pro- 
cedures such as stratified sanpling or systenatic sanpling. One could, 
for exanple, select sanples of schools stratified by the average IQ-level 
of ejirolled fifth-graders or by a measure of the average socio-economic ' 
status of pipils' families. As another alternative, one. could select a 
simple random sanple of school districts, and select systematic sanples ^ 
of fifth-graders from lists arranged in order of increasing IQ-soore with- 
in each sanpled district. Each of these alternatives would be nore ef- 
ficient than multi-stage random samplinq. 

Itie final choice among cluster sampling procedures depends on many 
factors, not tiie least of which is previous kncwledge of the population 
of interest. To choose among sanpling procedures intelligently, one • 
should have sane idea of the dc^gree of homogeneity within and among 



\ -44- 

potential clusters, and the relationships among variables for which 
estimates arc sought and those that might be used for stratification 
or as neasurcs of size. Even with these kinds of data, assurance that 
one has chosen the best of the available alternatives can only come 
through careful analysis and often, lengthy coit^ atation. (See Appendix 

A). 

It cannot be overenphasized that data typically available in schools 
and school districts can be used very effectively to design efficient ^ 
sampling procedures. A wealth of information on students, teachers, classes 
schools and school districts is routinely recorded and filed in school 
district offices and in office? of state departments of education. Data 
from previous testing programs arc abundantly available^in^aljiDSt all 
school districts ^id states. Background information on pupils and teachers 
is also on file in mDSt school districts. If judiciously selected and 
evaluated, these data can be used for stratification, for arrangement of 
populations in ordered lists, and for pretesting of potentially efficient 
sampling procedures. This mechanical use of information to' arrange and 
sort populations should not provoke charges of invasion of privacy, since 
individuals' nanes need be associated with individual data elements only 
for purposes of sanpling. 

^4atrix Sanpling 

Each of tlic saiplinq prcxx'duros cansidorod to this point has assumed 
that all sampled pupils rospc^nd to the same^ set of moasuros; c. g. , the 
samp reading comfjrcl ions ion tost. Tn tin- past ten years, researchers have 
paid ir jreasing attention to pnx'edures tliat sainple test items as well as 
students . These procedures arc termed multiple matrix sampling . 



-45- 

and have been used successfully in National Assessment as well as in 

several statewide assessments. ; ' 

. i ! 

Multiple natrix sampling could be used to estimate the average read- 

ing achievement of all fifth-graders in a state. The procedure might be 

as follows. Suppose that a 50- item reading achievement test was to be 

used* Instead of administering the entire test to all sanpled pipils, 

the test could be divided into five forms with ten items each. Each 

s^ipled pupil would then take a 10- item form instead of the entire 50- 

itein test. Each of the 50 items would be used in a 10-item form, and 

approsdnvately equal nuniaers of pupils would ccsiplete each 10-item form. 

Lord (1955; 1962) has developed formulas for estimating the average soDre 

pupils would have earned, had each ccmpleted the entire 50-it6m test. 

Bnipirical studies of the best way to divide tests into forms and the sizes 

of pi^il sanples to use with each form have been completed by Shdeneker 

(3970^? 1971) and Knapp (1968), anong others. 

Tb date, statistical procedures for analysis of multiple matrix 
sampling have been developed only for simple random sanpling of items 
and pupils. Althou^ more complex designs can be used, needed analytic 
procedures are not yet available. 

Objective 2: Estimating the proportion of third-grajjers in each school 
district who can successfully achieve an arithmetic djjective . 

Sone statewide assessments use test items that are specifically 
designed to measure the achievement of particular d^jectives. For example, 
an assessment mi^t include items designed to measure achievement of 
the arithmetic ob jectivo "Addition of pairs of single-digit integers" . 



-46- 

Tive'sucii items miglit be administered to a pupil, and the pupil might be 
said to have achieved the objective provided he can successfully complete 

three of the five items. . , 

Suppose that a statewide assessnoit contained such objectives- 
related items, and that the principal purpose of the assessment was to 
determine the proportion of pupils in each of the state's school districts 
that had achieved each designated objective. 

Many of the sampling procedures described above could be used to 
achieve Objective 2. Only in very small school districts {e. g., those 
with grade three enrol Imsnts under 200) would sampling be uneconomical. 
Among the procedures that might be used to adiieve Objective 2 are si^le 
random sampling of pupils, stratified random sampling, linear systematic 
. sanpling, and scxm forms of cluster sanpling. 

With Objective 2, each school district's third-graders would con- 
stitute a separate population, and sampling in each school district could 
be handled differently. That is, one district might use simple 'randan 
sampling, while anptliej: might use two-stage cluster sampling of schools 
and hamerooms, with homerooms stratified by average ability level of 
pupils, in practice, use of several different sanpling procedures 
would make good sense if the districts varied greatly in size. While 
cluster sampling would be infeasiole in a small school district (say, 
one with only three elementary schcx)ls) , it might prove to be highly 
efficient in a state's largest school districts. 

To accomplish Objective 2, simple • M\drm sampling would be handled 
just as it is described for Objective 1. Standard fonnulas e>dst for 



the estimation of proportions through sinple randan sanpling, as they 
do for the estimation of mean square errors (Murthy, 1967; Hansen, 
Hurwitz and Madow, 1953) . . ► . 

When the objective is estimation of a proportion, stratified 
sanpling is unlikely to afford afpreciable increases in efficiency 
over sinple randan sampling. To be efficient, stratified sampling 
requires that variances within strata be much smaller than the variance 
within the vAiole population. The variances of proportions are very 
similar, unless the proportions are extremely large or extremely small 
(the variances of proportions . in the range 0.2 to 0.8 are very similar) . 
Thus little reducticn in the variance of proportions can be gained from 
stratification . 

/ Use of linear systematic saitpling is just as reasonably for the 
achievement of Objecrtive 2 as it was for the achievement of .Ojjective 1. 
The same potential advantages, and the same cautions, apply. A school 

district is more likely than a state department of education to havie past 
test data and other infc^Aiation on individual students. Ihis information 
can be used to create ordered sanpling frames, permitting systematic 
sanpling from an ordered list. 

Unless a school district. is very large, multi-stage cluster sanpling 
will not be practical. For moderately ]arge'school systems (enrollments 
of ten thousand to thirty thousand) , single-stage cluster saifpling of 
homerooms is likely to be aAninistra,tively practical and statistically 
efficient for estimation of averages or 'proportions. Conpiling a list 
of third-grade homerooms should not be difficult in a district of mDderatc 



-48- 

size/ Sanpling by honeroom would pennit testing of intact groups of 
pupils, and would provide a convenient route for distrilxfliipn of materials 
and handling of assessment materials in the fielc|; 

Multiple matfix-sampling could also bo economical and convenient in 

all but the smallest school systems. Shoeiraker (1970) has shewn that 
multiple matrix sampling is useful for estimation of averages, provided 
the population is no smaller than 300. 

Surroary ^ 

This paper was intended to help the reader became conversant with 
important sampling terms and concepts, and to becone aware of s,impling \^ 
procedures that might be used in a statewide assessment. It waLs not 
intended to create instant sample-design experts or sampling theorists. 

If the reader has gained a basic understanding of sudi terms and 
concepts as estimate, estimator, population parameter, estimator bias, 
etc., and if sans of the sampling options available for statewide assess- 
nents are now intelligible, the paper has acconplished its purpose. 

Designing an efficient sample requires knowledge of the science 
of sampling. But perhaps more than in other statistically-oriented 
disciplines, good sample design is an art. It requires ^sensitivity 
to the nature of the populations of interes't, and ^tention to 
infortTBtion and data that iright, to the novice, seem unrelated to the 
sampling task at hand. For these reasons, there is no si*)StitutG for ■ 
experience when^ truly efficient sample design is desired. Investment 
' in expert sampling consultation will usually be repaid many times over 



( 



-49- 

by the eccnomies an efficient design provides. But xt behooves the 
assessident director to be ccnversant, if not ej^rt, on sanpling and 
its potentials. By kncwing a little about the subject, the ri^t ' 
questions can be asked, and the right data can be provided. The task 
of the sanple designer will be made- easier, and the resulting product 
all the better. 



« 



L. 



\ 




REFERENCES 



Burkhead, J., Input and Output in Large City High Schools . Na^ York: 
Syraaise University Press, 1967. 

Cochran, W. G., Sampling Techniques . New York: John Wiley and SonSr 
1963. 

Dalenius, T. and J. L. Hodges, Jr., "Minimum variance 'Stratification", 
Journal of tiie American Statistical Association , 54 (1959), 88-101, 

Cos 1 in, D. , Teachers and Testing . New York: Russell Sage Foundation, 
1967. • 

Hansen, M., W. Hurvitz .nnd W. G. Madow, Sample Survey Methods and Theory . 
2 vols. New York: Jo\m Wiley and Sons, 1953. 

Jaeger, R. M., "Designing sdiool testing programs for institutional 

appraisal: an application of sampling theory", Stanford, California: 
1970. Unpublished doctoral dissertation. 

Knapp, Thomas R. , "An application of balanced inoonplete block designs 
to the estimation of test norms", Fxlucational and Psychological 
Measurement , 28, (1968), 265-272. j 

Lord, F. M. , "Sampling fluctuations resulting "rom the sanpling of test 
items", Psychometrika , 20 (1955) ,| 1-23. 

Lord, F. k,, "Estinvjting norms by item sampling". Educational and 
Psychological Measurement , 22, (1962), 259-2671 



Mollenkopf, W. G. and S. D. Melville, "A stuJy of secondary school 
characteristics as related to test scores". Educational Ttesting 
Service, Itesearch Bulletin 56-6. Princeton: 1956. 

Murthy, ^L N., Sampling ^Iheory and Metliocte . Calcutta: Statistical 
Publishing Society, 1967. 

Shoemaker, D. M,, "Allocation of items and examinees in estimating a 
norm distribution by itom-sompling" , Journal of Educational 
Measuremgnt , 7, (1970), 123-128. 

Shoemaker, D. M. , "Further results on the standard erifors of estimate 
associated with item-oxaminec sampling procedures," Journal of 
Educational Measurement, 8, (1971) , 215-220. 



APPENDIX A 

Evaluation of Alternative Cluster Samplina Procedures— An Fxample 

When choosing amonq alternative cluster sanpling procedures, the 
kinds of • theoretical notion^ discussed in this paper (a procedure will 
be irore efficient when cluster sizes don't vary much, heterogeneity 

3 

within clusters and honoqeneity between clusters will provide increased 
efficiency^ etc.) provide some ouidance. In a specific application^ 
assurance that one is using the best procedure can also be gained through 
analysis of data from tl^ school district or state where sanpling is to 
be used. 

. Marty characteristics of schools, school districts, and groups of 

students show rennrkable stability from year to year. For example, the 

average basic skills achieven^t of a school's fourth-grade class is 

likely to be very similar in two successive years, as is the socio-economic 

coitpDsition of the school's student body. Vvlion secur-ching for a sairpling 

' procedure that provides n^imum efficiency, one can take advantage of 

this kind of stability. Th^ method is as follows: Use data -from the 

previous school year to evaluate the efficiency of the sampling procedures. 

« 

being considered for the current year. Since it is unlikely that Scurplinq 
has been used in the past, data will be available for all students, 
classes and schools in the district or' state. With data available for 
the entire population (a situation that will not hold for the airmnt 
school year if sampling is used), results of sampling the previous year's 
population using a variety of procedures can tx^ readily compared. 



i 

-52- 

An exanple of this kind of evaluation uses data from a sinqle school 
district, called Anydistrict (Jaegor, 1^70). For simplicity, computation 
of estiimtes and estinator variances will not be shown; only initial data 
and final results will be pres€)nted. 

. The population parameter to be estimated in this example is the 
average reading achievenent of the district's sixth-graders. The sixth- 
grade enrollment of the district is 1180, with 45 sixth-grade Classes 
in 21 schools. Data available from the previous school year include the 
average sixth-grade reading achievement in each school, the sixth-grade 
enrollnent in each school, and the average verbal ability score \of fifth- 

i 

graders in each school. These data will be used to evaluate fou?: alter- 
't/ative cluster sampling and estination procedures: Simple random sampling 
on schools with unbiased estiiratibn, simple random sampling of schools 
withPTatio estimation, sampling of schools with probabilities proportion- 
al to their sixth-grade enrol InEjnt?. , (PPS sampling and estimation)-, and' 
sairpling of schools with proba^jilities proportional to totals of fifth- 
grade ability test scores (PPES sampling and estimation) . 

The evaluation of each claster sanpling procedure will use data 
from the entire population of 21 schools. With these data, estimtor 

variances can be calculated exactly. It must be enphasized that 

\ 

data for the entire population will Idc available only when all sixth- 
graders in the district arc tested— a situation that will not obtain in 
the current school year, when sanpling is used. The method then, is to 
use population data from a previous school year to evaluatq. Alternative 
sampling procedures, and to ..ssumo that the rost efficient procedure for 
one school year will also Ix? most efficient for the next year. The as- 
sunption is generally sound. 




Ttie following table Shavs sixth-grade average reading adiieveinent 
scr>res, sixth-grade enrol linen ts, and average fifth-grade ability 
test scores for the 21 schools in the district under study. The data are 
real. They were pr6\a research office of a raediunv-sized school 

district. 



Table A: Sixth-grade Average Reading Achievements, Sixth-Grade Enrollinents, 
and Average Fifth-Grade Ability-ltest Scores for Elementary Schools in 
Anydistrict. i 



School Average Grade 6 Grade 6 Average Grade 5 

Nuitfcier Readi-.g Achievement* Enipllment Ability Score 



1 


66.11 


56 


33.54 


2 


66.83 


65 


32.96 


3 


71.27 

1 


71 . 


38.06 


4 


56.09 


58'^ 


33.81 


5 


64.57 


47 


34.29 


6 


71.09 


66 


37.84 


7 


74.89 


55 


36.70 


8 


70.67 


99 


37.69 


9 


74.51 


57 


39.06 


10 


68.13 


40 


37.19 




70.02 


' 59 


36.10 


12 


72.57 


72 


39.90 


13 


58.86 


43 


35.36 


14 


66.35 


63 


36.20 


15 


70.71 


38 


36.92 


16 


65.82 


51 


34.42 


17 


70.98 


51 


35.15 


18 


67.56 


41 


33.51 


19 


82.21 


29 


40.76 


20 


65.61 


74 


35.02 


21 


51.14 


49,.. 


30.18 



♦Average number of test items correct. 



-54- 

The data in Table A were used in fonailas for the variance of the 
estimted mean, appropriate to each of the four cluster sanpling and 



estimation procedures. In all cases, it was assuited that 10 of the 21 
schools in Anydistrict were sanpled, and that all sixth-graders in 
sanpled schools were tested. The sampling and estimation procedure 

♦ • 

that provided the smallest variance was judged .to be best. 

To evaluate PPS sampling, it was assuired that schools were' sanpled 

J 

with prol)abilities proportional to their sixth-grade enrollments (the 
data in the third column of Table A) . To evaluate PPES sanpling, a 
slightly more-conplex assumption was made. The measure of "size" used 
for a school was equal to the product of the school's sixth-grade en- 
roilroent, and the average ability- test score earned by the school's 
fifth-graders (the data in columns three and four in Table A) . While 
this prpduct '(sixth-grade enrollment t.in^s fifth-grade ability test 
score) might not have niuch meaninq as an assessnent statistic ^ it makes 
an excellent variable for PPES sampling since it is highly correlated 
with the total of sixth-grade reading achievement scores in a school. 

The variances of estimators of average sixth-grade achievement in 
^ the district are given in Table B, below: 



Table B: Variances of Estimators of Average Achievement for Sixth-Grade 
Students in Anydistrict. Sample Size is 10 Schools from a Population of 21. 



SamplLng and Estiiration Method Estimator Variance 
Siitple random sairpling of schools with 

unbiased estimation 21,790 
Simple random sampling of schools with 

ratio estimation 1,802 
Sanpling of schools with probabilities 

proportional to sixth-grade (^rollmnnts (PPS) 3.622 

Sanpling of schools with prcfoabilities \ 
proportional to fifth-grade ability test scores (PPfiS) 1.358 



-55- 

. Fran the data in Table B, it is clear that PPES sanpling of schools 
is the rnost efficient of the four cluster sanpling procedures. PPES 
sanpling is slightly more efficient than sinple random sanpling of schools 
with ratio estimation, nore than twicse as efficient as PPS s^irpling of 

schools, arid more than sixteen tiires as efficient as sinple random scmpling 

»■'• 

of schools with unbiased estinvition. Efficiency is calculated from the 
ratio of estimator variances. 

Although PPS sanpling and PPES sanpling are not consistent procedures, 
the variances of their estimators do decrease steadily as sanple size is 
increased. Sinple randan sanpling of clusters with unbiased estimation 
o r w ith ratio estiination are oonsi^+--ent, so the variances of their estima- 
tors also become steadily smaller as sanple size is increased. Ihus one 
-gai -generalize -frcan-tho-^data -in-Tabl&-B-fog-all- sanple sizes that are sub- 
stantially smaller than the population size. PPE^sanpling will be most 

y 

efficient, simple random sanpling of schools with ratio estimation will 
be next most efficient, PPS sairplinq will rank third in effici^cy, and 
sinple randGxn sanpling of schools with unbiased estimation will be very 
inefficient. 

^ The formulas used to calculate estimator variances in this exanple. 
can be found in nany sanpling texts, includina Murthy (1967) , Cochran 
(1963) and Hansen, Hurvitz and Madw (1953). 



